fact book
Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts
Gupta, Raavi, Panicker, Pranav Hari, Bhatia, Sumit, Ramakrishnan, Ganesh
Large language models (LLMs), despite their remarkable text generation capabilities, often hallucinate and generate text that is factually incorrect and not grounded in real-world knowledge. This poses serious risks in domains like healthcare, finance, and customer support. A typical way to use LLMs is via the APIs provided by LLM vendors where there is no access to model weights or options to fine-tune the model. Existing methods to detect hallucinations in such settings where the model access is restricted or constrained by resources typically require making multiple LLM API calls, increasing latency and API cost. We introduce CONFACTCHECK, an efficient hallucination detection approach that does not leverage any external knowledge base and works on the simple intuition that responses to factual probes within the generated text should be consistent within a single LLM and across different LLMs. Rigorous empirical evaluation on multiple datasets that cover both the generation of factual texts and the open generation shows that CONFACTCHECK can detect hallucinated facts efficiently using fewer resources and achieves higher accuracy scores compared to existing baselines that operate under similar conditions. Our code is available here.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Austria > Vienna (0.14)
- South America > Argentina (0.05)
- (9 more...)
- Research Report (1.00)
- Overview > Fact Book (0.43)
Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study
Jain, Amay, Cui, Liu, Chen, Si
Large language models like ChatGPT are increasingly used in classrooms, but they often provide outdated or fabricated information that can mislead students. Retrieval Augmented Generation (RAG) improves reliability of LLMs by grounding responses in external resources. We investigate two accessible RAG paradigms, vector-based retrieval and graph-based retrieval to identify best practices for classroom question answering (QA). Existing comparative studies fail to account for pedagogical factors such as educational disciplines, question types, and practical deployment costs. Using a novel dataset, EduScopeQA, of 3,176 questions across academic subjects, we measure performance on various educational query types, from specific facts to broad thematic discussions. We also evaluate system alignment with a dataset of systematically altered textbooks that contradict the LLM's latent knowledge. We find that OpenAI Vector Search RAG (representing vector-based RAG) performs well as a low-cost generalist, especially for quick fact retrieval. On the other hand, GraphRAG Global excels at providing pedagogically rich answers to thematic queries, and GraphRAG Local achieves the highest accuracy with the dense, altered textbooks when corpus integrity is critical. Accounting for the 10-20x higher resource usage of GraphRAG (representing graph-based RAG), we show that a dynamic branching framework that routes queries to the optimal retrieval method boosts fidelity and efficiency. These insights provide actionable guidelines for educators and system designers to integrate RAG-augmented LLMs into learning environments effectively.
- North America > United States > Pennsylvania > Chester County > West Chester (0.04)
- Asia > Philippines (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report (1.00)
- Overview > Fact Book (0.34)
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
Nguyen, Tin, Bolton, Logan, Taesiri, Mohammad Reza, Nguyen, Anh Totti
An Achilles heel of Large Language Models (LLMs) is their tendency to hallucinate non-factual statements. A response mixed of factual and non-factual statements poses a challenge for humans to verify and accurately base their decisions on. To combat this problem, we propose Highlighted Chain-of-Thought Prompting (HoT), a technique for prompting LLMs to generate responses with XML tags that ground facts to those provided in the query. That is, given an input question, LLMs would first re-format the question to add XML tags highlighting key facts, and then, generate a response with highlights over the facts referenced from the input. Interestingly, in few-shot settings, HoT outperforms vanilla chain of thought prompting (CoT) on a wide range of 17 tasks from arithmetic, reading comprehension to logical reasoning. When asking humans to verify LLM responses, highlights help time-limited participants to more accurately and efficiently recognize when LLMs are correct. Yet, surprisingly, when LLMs are wrong, HoTs tend to make users believe that an answer is correct.
- Europe > Ukraine (0.27)
- Asia (0.27)
- North America > Mexico > Veracruz (0.14)
- (5 more...)
- Research Report (1.00)
- Overview > Fact Book (0.34)
- Leisure & Entertainment > Sports > Football (1.00)
- Health & Medicine > Therapeutic Area (1.00)
FactFlow: Automatic Fact Sheet Generation and Customization from Tabular Dataset via AI Chain Design & Implementation
Vu, Minh Duc, Chen, Jieshan, Xing, Zhenchang, Lu, Qinghua, Xu, Xiwei, Fu, Qian
With the proliferation of data across various domains, there is a critical demand for tools that enable non-experts to derive meaningful insights without deep data analysis skills. To address this need, existing automatic fact sheet generation tools offer heuristic-based solutions to extract facts and generate stories. However, they inadequately grasp the semantics of data and struggle to generate narratives that fully capture the semantics of the dataset or align the fact sheet with specific user needs. Addressing these shortcomings, this paper introduces \tool, a novel tool designed for the automatic generation and customisation of fact sheets. \tool applies the concept of collaborative AI workers to transform raw tabular dataset into comprehensive, visually compelling fact sheets. We define effective taxonomy to profile AI worker for specialised tasks. Furthermore, \tool empowers users to refine these fact sheets through intuitive natural language commands, ensuring the final outputs align closely with individual preferences and requirements. Our user evaluation with 18 participants confirms that \tool not only surpasses state-of-the-art baselines in automated fact sheet production but also provides a positive user experience during customization tasks.
- Media > Film (0.46)
- Information Technology > Security & Privacy (0.46)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
Li, Haitao, Ai, Qingyao, Han, Xinyan, Chen, Jia, Dong, Qian, Liu, Yiqun, Chen, Chong, Tian, Qi
Recent research demonstrates the effectiveness of using pre-trained language models for legal case retrieval. Most of the existing works focus on improving the representation ability for the contextualized embedding of the [CLS] token and calculate relevance using textual semantic similarity. However, in the legal domain, textual semantic similarity does not always imply that the cases are relevant enough. Instead, relevance in legal cases primarily depends on the similarity of key facts that impact the final judgment. Without proper treatments, the discriminative ability of learned representations could be limited since legal cases are lengthy and contain numerous non-key facts. To this end, we introduce DELTA, a discriminative model designed for legal case retrieval. The basic idea involves pinpointing key facts in legal cases and pulling the contextualized embedding of the [CLS] token closer to the key facts while pushing away from the non-key facts, which can warm up the case embedding space in an unsupervised manner. To be specific, this study brings the word alignment mechanism to the contextual masked auto-encoder. First, we leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Second, we employ the deep decoder to enable translation between different structures, with the goal of pinpointing key facts to enhance discriminative ability. Comprehensive experiments conducted on publicly available legal benchmarks show that our approach can outperform existing state-of-the-art methods in legal case retrieval. It provides a new perspective on the in-depth understanding and processing of legal case documents.
- Asia > China (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Overview > Fact Book (1.00)
- Research Report > New Finding (0.88)
ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts
Mukherjee, Rajdeep, Bohra, Abhinav, Banerjee, Akash, Sharma, Soumya, Hegde, Manjunath, Shaikh, Afreen, Shrivastava, Shivani, Dasgupta, Koustuv, Ganguly, Niloy, Ghosh, Saptarshi, Goyal, Pawan
Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or government reports. Efficient techniques to summarize financial documents, including facts and figures, have largely been unexplored, majorly due to the unavailability of suitable datasets. In this work, we present ECTSum, a new dataset with transcripts of earnings calls (ECTs), hosted by publicly traded companies, as documents, and short experts-written telegram-style bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format. We benchmark our dataset with state-of-the-art summarizers across various metrics evaluating the content quality and factual consistency of the generated summaries. Finally, we present a simple-yet-effective approach, ECT-BPS, to generate a set of bullet points that precisely capture the important facts discussed in the calls.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (11 more...)
- Research Report (1.00)
- Financial News (1.00)
- Overview > Fact Book (0.34)
Global Economic Impact of AI: Facts and Figures
Wall Street, venture capitalists, technology executives, data scientists -- all have important reasons to understand the growth and opportunity in the artificial intelligence market to access business growth and opportunities. This gives them insights on funds invested in AI and analytics as well potential revenue growth and turnover. Indeed, the growth of AI, continuing research, development of easier open source libraries and applications in small to large scale industries are sure to revolutionize the industry the next two decades and the impact is getting felt in almost all the countries worldwide. To dive deep into the growth of AI and future trends, an insight into the type and size of the market is essential along with (a) AI-related industry market research forecasts and (b) data from reputable research sources for insight into AI valuation and forecasting. IBM's CEO claims a potential $2 trillion dollar market for "cognitive computing").
Artificial Intelligence in Cancer: How Is It Used in Practice? - Cancer Therapy Advisor
Artificial intelligence (AI) comprises a type of computer science that develops entities, such as software programs, that can intelligently perform tasks or make decisions.1 The development and use of AI in health care is not new; the first ideas that created the foundation of AI were documented in 1956, and automated clinical tools that were developed between the 1970s and 1990s are now in routine use. These tools, such as the automated interpretation of electrocardiograms, may seem simple, but are considered AI. Today, AI is being harnessed to help with "big" problems in medicine -- such as processing and interpreting large amounts of data in research and in clinical settings, including reading imaging or results from broad genetic-testing panels.1 In oncology, AI is not yet being used broadly, but its use is being studied in several areas.
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
Artificial Intelligence Fact Sheet - Content Science Review
Content Science is a content strategy and intelligence firm based in Atlanta, GA. Founded in 2010 by Colleen Jones, author of Clout: The Art Science of Influential Web Content, our mission is to transform industries, organizations, and individuals for the better by putting content first. We offer professional services, publications, and software for clients ranging from Fortune 50 companies to nonprofits to government agencies.
Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System
Burke, Robin D., Hammond, Kristian J., Kulyukin, Vladimir, Lytinen, Steven L., Tomuro, Noriko, Schoenberg, Scott
This article describes FAQ FINDER, a natural language question-answering system that uses files of frequently asked questions as its knowledge base. Unlike AI question-answering systems that focus on the generation of new answers, FAQ FINDER retrieves existing ones found in frequently asked question files. Unlike information-retrieval approaches that rely on a purely lexical metric of similarity between query and document, FAQ FINDER uses a semantic knowledge base (WORDNET) to improve its ability to match question and answer. We include results from an evaluation of the system's performance and show that a combination of semantic and statistical techniques works better than any single approach.